Who is reporting non‐native species and how? A cross‐expert assessment of practices and drivers of non‐native biodiversity reporting in species regional listing

Abstract Each year, hundreds of scientific works with species' geographical data are published. However, these data can be challenging to identify, collect, and integrate into analytical workflows due to differences in reporting structures, storage formats, and the omission or inconsistency of relevant information and terminology. These difficulties tend to be aggravated for non‐native species, given varying attitudes toward non‐native species reporting and the existence of an additional layer of invasion‐related terminology. Thus, our objective is to identify the current practices and drivers of the geographical reporting of non‐native species in the scientific literature. We conducted an online survey targeting authors of species regional checklists—a widely published source of biogeographical data—where we asked about reporting habits and perceptions regarding non‐native taxa. The responses and the relationships between response variables and predictors were analyzed using descriptive statistics and ordinal logistic regression models. With a response rate of 22.4% (n = 113), we found that nearly half of respondents (45.5%) do not always report non‐native taxa, and of those who report, many (44.7%) do not always differentiate them from native taxa. Close to half of respondents (46.4%) also view the terminology of biological invasions as an obstacle to the reporting of non‐native taxa. The ways in which checklist information is provided are varied, but mainly correspond to descriptive text and embedded tables with non‐native species (when given) mentioned alongside native species. Only 13.4% of respondents mention to always provide the data in automation‐friendly formats or its publication in biodiversity data repositories. Data on the distribution of non‐native species are essential for monitoring global biodiversity change and preventing biological invasions. Despite its importance our results show an urgent need to improve the frequency, accessibility, and consistency of publication of these data.


| INTRODUC TI ON
Our planet is undergoing major biodiversity and biogeographical changes (Sala et al., 2000), for which the human-mediated transportation and introduction of species outside of their native ranges is a major contributing factor (Pyšek et al., 2020;Seebens et al., 2017).
While many of the introduced species fail to establish, some eventually succeed and become a permanent addition to local ecosystems. Altogether, these "established non-native species" (hereafter referred to as "non-native species," for simplicity) already represent a substantial portion of biodiversity in many regions of the planet (McGeoch et al., 2010;Seebens et al., 2017), reshaping biogeographical patterns (Capinha et al., 2020), trophic networks (Bezerra et al., 2016), and the functioning of ecosystems (e.g., Wardle & Peltzer, 2017). In addition, a subset of non-native species also causes important negative impacts to human well-being, economy, and biodiversity (the so-called "invasive alien species"; e.g., Gallardo et al., 2016;Shackleton et al., 2019;Zenni et al., 2021).
Despite the increasing presence of non-native species in ecosystems worldwide , obtaining a comprehensive, up-to-date description of their distribution remains, in general, a difficult task . Knowledge about the distribution of these taxa is crucial to assess ongoing biodiversity changes and preventing negative impacts posed by invasive species (Brondízio et al., 2019); however, primary distribution data (i.e., records of the species occurrence) are usually scattered across distinct data sources of multiple types (e.g., scientific literature, biodiversity observation databases, technical documents, and gray literature), with each source often adopting distinct data reporting structures, publication formats and terminology. These issues resulted in a number of recent initiatives (e.g., Darwin Core-DwC, FAIR workflow) of aggregation and standardization of distribution data for non-native species (Reyserhove et al., 2020;Wieczorek et al., 2012) and the development of technical pipelines aiming to automate the data assimilation and harmonization procedures (e.g., Seebens & Kaplan, 2022).
Despite these efforts, a better understanding of the factors behind the lack of standardization of non-native species reporting data remains necessary to act upstream of the problem and reduce reliance on post-publication data aggregation efforts, which can be resourceconsuming and difficult to maintain in the long term.
One example of data sources where the issues of non-native species reporting are most evident are species regional checklists.
These checklists (sometimes also named simply as species "lists" or "inventories") are a widely used approach to inform about the species that occur in a region at a particular time, ranging from simple, tabular-type, listing of the species, to comprehensive, textual and image-supported description of their local habitats, origin, and other attributes. Each year, hundreds of these checklists are published in peer-reviewed journals, constituting one of the most widespread, and valuable sources of biodiversity information, enabling, for example, the direct quantification of key metrics of biodiversity change, such as species richness and taxonomic composition (Chase et al., 2019). However, several challenges hinder the widespread use of these data (Reyserhove et al., 2020), particularly concerning nonnative biodiversity. Difficulties include the uneven reporting of nonnative species among authors, with some reporting these but not always identifying them as non-native, and others listing only native taxa. Furthermore, when non-native species are listed and identified, authors often use inconsistent terminology, for example, the terms "introduced," "alien," "exotic," "nonindigenous," or "invasive" can appear as synonyms or to distinguish between establishment or nativity statuses (Pyšek et al., 2004). These issues add to others, not specific to non-native species, including the widespread adoption of publication formats that are non-machine-reading friendly (e.g., Portable Document Format "PDF"), and the use of unstructured data reporting structures, which can include tables, descriptive text and a mix of both.
In this context, we here analyze the patterns of non-native species reporting in regional species checklists and investigate factors potentially associated to the patterns observed. Specifically, we surveyed experts who authored or co-authored species checklists in past years concerning: (1) the practices they adopted in reporting non-native taxa, (2) their expertise and training background; and (3) perceived drivers for the practices they adopted. We analyze these data aiming to identify expert-related factors that drive commonalities in species reporting practices and to highlight most relevant difficulties in further easing the access and harmonizing the biogeographical data that is being published for non-native taxa.

| Survey approach
To investigate the practices and drivers of non-native biodiversity reporting in species checklists, we conducted an online survey aimed at authors of these lists. To identify these authors, in November 2021, we performed a literature search on Google Scholar, employing the search words: ("Species" AND "Checklist"). Because our aim was to obtain a cross-taxonomic global-scale assessment, we did not employ any terms referring to geographical areas or species groups.
We restricted the search to documents published between 2000 and 2021, from which we examined the title and abstract of each, to identify those effectively referring to species regional checklists (i.e., a publication aimed at listing all species of a taxonomic group or groups, known to be occurring in a geographical area). From the

T A X O N O M Y C L A S S I F I C A T I O N
Biogeography retained publications, we identified a total of 505 authors for which an e-mail contact was provided (corresponding author or the first author when the identification of corresponding author was not possible; See Appendix S1). We sent an email inviting experts to participate in the survey, mentioning a response acceptance period from 8th March to 19th April 2022, totaling 6 weeks. During this period, we sent two reminders (March 29, 2022, at 03:07 a.m and April 12, 2022 at 09:00 a.m). In the first message, participants were also informed about the reason they were invited, the general aim of the survey and the guarantee of anonymity of responses. In this regard, the survey was checked and approved by the Ethics Committee of the Institute of Geography and Spatial Planning of the University of Lisbon.

| Survey measures
The online survey was implemented in google forms, in English language and was divided into six sections, "A" to "E" plus an introductory section. The introductory section contained information about the aim of the study, estimated survey duration, ethics, and personal data protection information, and one mandatory question to confirm agreement to participate in the survey.

| Statistical analysis
We used ordinal logistic regression models to analyze the relationships between respondent's expertise and research focus with the responses concerning: (1) frequency of non-native reporting; (2) frequency of non-native status differentiation; (3) knowledge about the terms and definitions referring to biological invasions and (4) the provision of data in machine-readable formats or its publication in biodiversity data repositories. Ordinal regression was chosen because our independent variables are ordered variables representing Likert scales. In all models, we used the same set of predictors: respondent's self-assessed level of expertise in each taxonomic group (plants, vertebrates, invertebrates, microorganisms, and fungi) and environmental realm (terrestrial, freshwater, and marine), and quantity of checklists published in each biome (tropical, subtropical, temperate and polar). We selected these predictors because we expect to find an effect for taxonomy, realms, and biomes in the patterns of response owing to uneven levels of knowledge across these (Collen et al., 2008;Troudet et al., 2017), leading to differences in the focus given to non-native taxa. In other words, we expect that taxonomic groups and environments that have been historically better studied (e.g., vertebrates and terrestrial environments in temperate regions) will show a higher reporting of non-native taxa and vice-versa.
Concerning non-native status differentiation, the analysis used only the subset of respondents who mentioned to report non-native species (i.e., responses "rarely" to "always"), totaling 105 answers. The ordinal logistic regressions were implemented in R programming language (R Core Team, 2022) through Polr function from "mass" package (Venables & Springer, 2002).

| Background of respondents
Out of the 505 invitations sent via email, 42 were automatically returned due to reasons such as non-existent email addresses, resulting a non-contact rate of 8.3%. In total, we received 113 completed submissions (response rate = 22.4%), of which 112 were fully completed (99.1% of all submissions).
Concerning the taxonomic expertise of participants, most had no expertise in microorganisms and fungi (81.3% and 79.5%, respectively). On the other hand, invertebrates were the taxonomic group for which high or very high expertise was most prevalent (38.4% of all participants). Vertebrates and vascular plants showed intermediate levels of high to very high expertise (25% and 20.6%, respectively).
Concerning environmental realms, the marine one had the highest proportion of respondents with "no expertise" (49.1%). On the other hand, the terrestrial realm had the highest prevalence of high to very high expertise (59.8%), followed by freshwater (23.2%), and marine areas (18.8%).

| Reporting of non-native species
Of the 112 survey respondents, slightly less than half (45.5%) indicate that they do not always include non-native established species in the checklists, with 6.3% (seven respondents) saying that they never include them ( Figure 1A). When adding non-native species to the checklists, most respondents mention that they list all they are aware of (66.1%), while 20.6% mention that they do not necessarily list all of them, with 15.2% mentioning to only report invasive species or subsets of these species that obey to no criterion (5.4%) ( Figure 1D).
The reason most consistently pointed for not including non-native species "always" is the additional amount of work or resources required (7.3%). However, when also considering causes mentioned to occur "sometimes" and "very often," the most common reason becomes "my work is not focused on non-native species" (58.1%) (Figure 2).
Situations where no non-native species are known in the region are also commonly pointed for their omission from the checklists, however, this 'ideal' situation corresponds to <10% of the cases indicated as occurring "always" and <25% when "very often" is also considered.
Concerning the reporting of the non-native status of the species, 43.7% mention that they do not always provide this information (i.e., they do not differentiate native from non-native F I G U R E 1 Summary of responses concerning reporting habits of non-native species. species) and 18.1% of respondents mention that they never or rarely add this differentiation ( Figure 1B). However, when asked about their agreement that the omission of established non-native species from regional species lists or the non-indication of their non-native status is an obstacle to monitoring biogeographical and biodiversity change, most respondents agree or strongly agree (80.3%; Figure 1C).

| Issues of terminology
Concerning issues related to terminology, we found that nearly half (46.4%) of respondents agree or strongly agree that the terminology of biological invasions is an obstacle to the addition of nonnative established species in regional species checklists. A total of 42% respondents has no opinion about this and 11.6% disagree or F I G U R E 2 Summary of responses concerning reasons for not including nonnative species in checklists.

F I G U R E 3 Summary of responses concerning issues of biological invasions terminology.
strongly disagree ( Figure 3A). Nonetheless, 47.4% agree or strongly agree that terminology on biological invasions and non-native species is becoming increasingly standardized and well-defined, so that the use of terms is increasingly straightforward. Most of the remaining respondents had a neutral perspective and only a few disagree with this assertion ( Figure 3B). Concerning the knowledgeability of respondents on terms and definitions used in biological invasions, nearly 50% indicate to have no knowledge to moderate knowledge ( Figure 3C). Finally, of those who include non-native species in the checklists and mention them as such, the provision of a definition for used terms or the indication of a bibliographic reference is not always provided, with 31.3% never or rarely providing it, more than those who always provide it (28.6%; Figure 3D).

| Data delivery
Two-thirds of respondents indicate that the listing of species is provided as a standardized table or list, summarizing the same categories of information for all species ( Figure 4A2), whereas 42.9% mention that they provide a descriptive text for each species, either solely or as complement to a standardized table or list ( Figure 4A1).
A total of 44.7% of respondents who include non-native species in the checklists, report the species' non-native status using a descriptive text alongside the native species ("always" and "very often" answers). The second most used report option is tables alongside the native species (e.g., using symbols or other indications), with 40% of "very often" and "always" answers ( Figure 4B3,B5).
Concerning the provisioning of the data in machine-readable formats (e.g., excel file, CSV, or XML), only 13.4% of experts mention that they always provide these or publish the data in a standardized biodiversity data repository. Moreover, 44.6% mention that they never or rarely do this ( Figure 4C). Nonetheless, 44.7% strongly agree or agree with the idea that the structure of regional species lists, and the data formats used to publish them are becoming increasingly standardized, with 42.9% having no opinion about it ( Figure 4D).

| Regressions analyses
Ordinal logistic regression identified a significant (α = 0.05) negative relationship between the quantity of checklists published for tropical biomes and the frequency of reporting of non-native species ( Figure 5a). No significant relationships were found between this frequency and taxonomic or environmental realm expertise.
When non-native species are included in the checklists, the reporting of their non-native status appears to be significantly more common among experts working in the marine realm (Figure 5b).
Respondents with higher expertise in plants, vertebrates, invertebrates, and the freshwater environmental realm have a significant positive relationship with higher knowledge of the terms and definitions of biological invasions (Figure 5c). No significant relationship was found between respondents' expertise and research focus on distinct biomes, and the frequency of provision of data in machinereadable formats (Figure 5d).

| DISCUSS ION
This work provides a detailed assessment of the attitudes and practices of non-native species reporting by authors of regional species checklists. We found that nearly half of these authors do not always The high frequency of omission of non-native species from regional checklists is surprising and takes place despite most checklist authors agreeing that it hinders the monitoring of biogeographical and biodiversity change. Moreover, this omission appears to occur largely irrespectively of the taxonomic groups, environmental realms, and biomes of focus of the authors. The only exception to this concerns authors with a greater focus on tropical biomes, who tend to omit non-native species more frequently. This relationship may be explained by a primary focus by these authors on listing only native taxa and discovering new species, as these areas are expected to retain the vast majority of terrestrial species that remain undescribed (Giam et al., 2012). In fact, a relevant number of checklist authors have indicated that the omission of non-native taxa is simply due to the focus of their work not including these taxa.
Another reason often given is the amount of additional resources that would be required to include these taxa, a factor that may be even more pressing in tropical areas, where pre-existing levels of biodiversity information are generally lower and the resources needed for undertaking new field surveys are expectedly higher.
Intriguingly, we found no relationship between the frequency of non-native species reporting and level of research focus on the temperate regions. Arguably, a greater focus on these areas should be positively related to the propensity to report non-native taxa, as temperate regions host the greatest richness of non-native taxa  and ratios of non-native to native biodiversity (Sax, 2001). At the same time, these areas also host most economically developed countries having greater resources for biodiversity monitoring (Pereira et al., 2010). One possibility for this may be that, despite their prominence in these regions, non-native species are generally regarded as a "distinct" assemblage, even among ecologists, and are often reported in isolation from native species, even though they play relevant roles in biological communities and represent an important portion of local biodiversity (Schlaepfer, 2018).
That is, although the reporting of non-native species occurs, it is F I G U R E 4 Summary of responses concerning data delivery.
done in isolation from native species, making it difficult to obtain aggregated information about the dynamics and trends of biodiversity in these areas.
Relevantly, checklists that include non-native taxa often omit their non-native status, a practice that also occurs largely regardless of the background of the checklist authors. The one exception is experts working in marine environments, who have a significant propensity to differentiate between native and non-native taxa on checklists. The reason for this relationship is unclear. However, despite the paucity of overall knowledge about marine invasions (Ojaveer et al., 2015), marine and coastal ecosystems have been severely impacted by a set of prominent invasive species, often also massively impacting human activities either positively (e.g., red king crab Paralithodes camtchaticus, a non-native species which is an important source of income for Norwegian seafood industry, Lorentzen et al., 2018) or negatively (e.g., sea star Asterias amurensis which feeds commercially valuable species in southern Australia, Ross et al., 2002). One possibility is that the societal notoriety of some of these species may have contributed to an overall awareness about the relevance of reporting their distribution and of other nonnative species.
A relevant portion of our survey participants view biological invasion terminology as an obstacle to the addition of nonnative species to regional checklists. Terminology and associated definitions are a key facet of invasive species reporting (Golebie et al., 2022), but also a recognized source of challenges, ambiguities, and disagreements among invasion ecologists (e.g., Essl et al., 2018;Mcgeoch et al., 2012;Wilson, 2020), which can be understood in part by the youth of biological invasions as a scientific discipline and the underlying conceptual differences in fields that comprise invasion research, such as biogeography, conservation, ecology, or evolutionary biology (Heger et al., 2013). However, it is now more than 10 years since notable efforts to harmonize terms and definitions have been published (Blackburn et al., 2011;Richardson et al., 2011), and there is now a prevailing view that these terms and definitions are increasingly standardized and welldefined among the research community (Golebie et al., 2022), a view that is only contradicted by a very small percentage of our survey participants. Thus, it seems likely that a harmonized use of invasion terminology will become increasingly prevalent in regional species checklists as time progresses. However, we also found that, for the time being, a large percentage of checklist F I G U R E 5 Results of ordinal logistic regressions, relating patterns of variation in answers and levels of expertise in distinct taxonomic groups, environmental realms and quantity of checklists published in each biome. The responses concern (a) the frequency of inclusion of non-native species in the checklists; (b) the identification of non-native species status; (c) knowledge about terms and definitions of biological invasions; (d) the provision of data in machine-readable formats or publication in biodiversity data repositories. (See Appendix S4).
authors still have limited knowledge of this terminology. For example, significant positive relationships between taxonomic expertise and levels of terminology knowledge were found for plants, vertebrates, and invertebrates, but not for fungi and microorganisms groups for which knowledge about invasions is difficult to obtain and remains limited (Litchman, 2010;Monteiro et al., 2022).
Thus, efforts to improve knowledge of this terminology and its harmonized adoption among checklist authors seem justified. This could be performed, in part, by the publication of terminology guidelines aimed specifically at biogeographical data providers (such as authors of species checklists), similar to what has recently been done for other application areas such as policy and management (Essl et al., 2018).
Our results also show that data in checklists are dominantly being published using unstructured data types and are only rarely submitted to biodiversity data repositories. These practices are sub-optimal for data discovery and integration into analytical workflows, often requiring the use of manual procedures (Reyserhove et al., 2020). Furthermore, the inhibitory effects of these practices are intensified by the increasing volume of scientific publications (Bornmann & Mutz, 2015), a trend that is also likely occurring for species checklists. Thus, although regional species checklists are a prime source of information on biodiversity status and trends, often constituting the most comprehensive source of information for an area, including for non-native species, the use of this information can pose major analytical challenges, particularly if integration of information contained in multiple lists is to be achieved. The ideal way to resolve these difficulties is undoubtedly through increasing standardization of the data structures and formats used to provide and store the data, as is being already promoted by some journals publishing species checklists (e.g., Biodiversity Data Journal), as well as by the publishing of data in standardized data repositories such as GBIF (gbif.org). Despite the preferential obviousness of this path, widespread adoption of these practices is unlikely to be achieved in full or in the near future. In this context, procedures for the automatic identification and extraction of data from unstructured data sources, such as pdf versions of published articles, may assume significant relevance. The ability to process unstructured data has grown greatly in recent years, driven largely by new computational tools, such as natural language processing models, capable of interpreting written text and extracting relevant information (Farrell et al., 2022). To date, these technologies have not yet provided a fully functional means of extracting complex custom-based information from ecological and biogeographic literature, but recent advances, such as identifying species names in publications (Jarić et al., 2020;Le Guillarme & Thuiller, 2022), suggest that this may soon become possible.
We investigated the attitudes and practices of reporting of non-native species in regional species checklists. While constituting a prime source of biogeographical information, regional species checklists still pose a number of pressing difficulties for biodiversity change and non-native species research, including the total or partial omission of non-native taxa and of associated information (such as native status), uncertainty and ambiguity in the terms and definitions used to classify these species and the common provision of information in an unstructured manner, limiting data assimilation and integration into automated workflows.
Despite these challenges, the prevailing view among authors is that the addition of non-native taxa to the checklists is valuable for assessing ongoing biogeographical and biodiversity changes. Thus, the time seems ripe for raising awareness about the relevance of listing these taxa and the adoption of standardized terminology and data sharing practices.

ACK N OWLED G M ENTS
We would like to thank all the experts who generously have partici-

CO N FLI C T O F I NTE R E S T S TATE M E NT
No conflict of interest is declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used in this study include human subject data that cannot be shared publicly.